A Sequence-Focused Parallelisation of EMBOSS on a Cluster of Workstations

نویسندگان

  • Karl Podesta
  • Martin Crane
  • Heather J. Ruskin
چکیده

A number of individual bioinformatics applications (particularly BLAST and other sequence searching methods) have recently been implemented over clusters of workstations to take advantage of extra processing power. Performance improvements are achieved for increasingly large sets of input data (sequences and databases), using these implementations. We present an analysis of programs in the EMBOSS suite based on increasing sequence size, and implement these programs in parallel over a cluster of workstations using sequence segmentation with overlap. We observe general increases in runtime for all programs, and examine the speedup for the most intensive ones to establish an optimum segmentation size for those programs across the cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An overview of the wcd EST clustering tool

UNLABELLED The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d(2) distance function or edit distance, improving existing implementations of d(2). It supports merging, refinement and reclustering of clusters. It is 'drop in' compatible with the StackPack clust...

متن کامل

Website Update: A New Graphical User Interface to EMBOSS

EMBOSS, the European Molecular Biology Open Software Suite is a collection of over 150 bioinformatics programs. It provides individual applications for retrieval, editing, and analysis of nucleotide and peptide sequences. It also includes software for analysing protein structure. Distributed under the GNU General Public License software agreements [6], all programs are available, free of charge...

متن کامل

PARSAR: Parallelisation of a Chirp Scaling Algorithm SAR Processor

A parallel SAR processor is presented in this paper. The target configuration is a cluster of UNIX workstations, available in most user sites. This fact allows to obtain an increased computing performance without the need of dedicated hardware investment.

متن کامل

Semantics-Based Composition of EMBOSS Services with Bio-jETI

Bio-jETI is a framework for model-based, graphical design, execution and management of bioinformatics analysis processes. Formal methodology like automatic service composition extends the framework and, in particular, allows for semantically aware workflow development. In this study we apply the workflow synthesis methodology to the EMBOSS suite of sequence analysis tools. As neither the tool s...

متن کامل

An Analysis Based on Heuristic Method Using CLUSTAL and EMBOSS

The Multiple Sequence Alignment (MSA) methods aims to align several sequences together and find similarities, or differences and infer structural, functional or evolutionary relationships. Since a multiple sequence alignment problem can be solved by a well known dynamic programming algorithm have exponential time complexity, the heuristic approaches are commonly used. In this paper, multiple se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004